Count-Min Sketch

A random hashing-based method for frequent elements problem.

Solves point query problem: given any value vv, let f(v)=βˆ‘i=1nπŸ™[xi=v]f(v)=\sum_{i=1}^n \mathbb{1}[x_i=v] be the number of times vv appears in the stream.

Goal: return estimate fΜƒ(v)\tilde{f}(v) such that f(v)≀fΜƒ(v)≀f(v)+Ο΅knf(v) \leq \tilde{f}(v) \leq f(v) + \frac{\epsilon}{k}n.

Solving Frequent items: return all items for which fΜƒ(v)β‰₯nk\tilde{f}(v) \geq \frac{n}{k}.

(assume access to a Uniformly Random Hash Function)

Count-Min Update:

return estimate fΜƒ(v)=𝐀[h(v)]\tilde{f}(v) = \mathbf{A}[h(v)]

𝐀[h(v)]=f(v)+βˆ‘yβ‰ vπŸ™[h(y)=h(v)]β‹…f(y)\mathbf{A}[h(v)] = f(v) + \sum_{y \neq v} \mathbb{1}[h(y)=h(v)]\cdot f(y) (this rightward summation term is error in frequency estimate)

Expected error is 𝔼[βˆ‘yβ‰ vπŸ™[h(y)=h(v)]β‹…f(y)]=βˆ‘yβ‰ v𝔼[πŸ™[h(y)=h(v)]β‹…f(y)]=βˆ‘yβ‰ vf(y)𝔼[πŸ™[h(y)=h(v)]]=1mβˆ‘yβ‰ vf(y)≀nm\begin{align*} \mathbb{E}\left[\sum_{y \neq v} \mathbb{1}[h(y)=h(v)]\cdot f(y)\right] = \sum_{y \neq v} \mathbb{E}[\mathbb{1}[h(y)=h(v)]\cdot f(y)] \\ = \sum_{y \neq v} f(y) \mathbb{E}[\mathbb{1}[h(y)=h(v)]] = \frac{1}{m} \sum_{y \neq v} f(y) \leq \frac{n}{m} \end{align*}

Bound of probability of error β‰₯2nm\geq \frac{2n}{m}? Use Markov’s Inequality: Pr[βˆ‘yβ‰ x:h(y)=h(x)f(y)β‰₯2nm]≀12\mathrm{Pr}\left[ \sum_{y \neq x: h(y)=h(x)} f(y) \geq \frac{2n}{m} \right] \leq \frac{1}{2}

Claim: for any vv, with probability at least 1/21/2, f(v)≀A[h(v)]≀f(v)+2nmf(v)\leq A[h(v)] \leq f(v) + \frac{2n}{m}

To solve point query with error Ο΅kn\frac{\epsilon}{k}n, set m=2kΟ΅m=\frac{2k}{\epsilon}.

tt length mm arrays Estimate f(v)f(v) with fΜƒ(v)=mini∈[t]Ai[hi(v)]\tilde{f}(v) = \min_{i∈[t]} A_i[h_i(v)].


See also: (Ξ΅, k)-Frequent Items Problem

References:

  1. G. Cormode and S. Muthukrishnan, β€œAn improved data stream summary: the count-min sketch and its applications,” Journal of Algorithms, vol. 55, no. 1, pp. 58–75, Apr. 2005, doi: 10.1016/j.jalgor.2003.12.001.
  2. https://www.chrismusco.com/amlds2023/notes/lecture01.html#Count-Min_Sketch
  3. https://www.chrismusco.com/amlds2023/lectures/lec1_annotated.pdf ^4e6aed